Below is a scatterplot comparing GDP per capita with CO2 emissions in metric tons per capita in the year 1962. The correlation between the two variables and its associated p-value are depicted on the plot.
## Warning: Removed 151 rows containing non-finite values (`stat_cor()`).
## Warning: Removed 151 rows containing missing values (`geom_point()`).
Below is the year in which the correlation between GDP per capita and
CO2 emissions in metric tons per capita was the strongest.
## [1] 1967
Below is an interactive plot depicting GDP per capita and CO2 emissions in metric tons per capita in 1967, where the size of each point is determined by the size of the corresponding nation’s population and the points are color coded by continent.
## Warning: Removed 146 rows containing non-finite values (`stat_cor()`).
Below we will investigate the relationship between continent and energy use (kg of oil per capita) spanning the entirety of the dataset. Given that continent is a nonbinary, categorical variable and energy use is a quantitative variable, we will start with an ANOVA test for statistical significance.
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:plotly':
##
## select
## The following object is masked from 'package:dplyr':
##
## select
##
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
## Df Sum Sq Mean Sq F value Pr(>F)
## continent 4 7.715e+08 192870621 51.46 <2e-16 ***
## Residuals 843 3.160e+09 3748033
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 436 observations deleted due to missingness
Given that the p-value is below 0.05, at a standard confidence level of 95%, we have determined that there is some statistically significant relationship between continent and energy use in this dataset. For more specific understanding of that relationship, we will now use Tukey HSD testing for pairwise comparisons of energy use between each two countries.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = `Energy use (kg of oil equivalent per capita)` ~ continent, data = withCountry)
##
## $continent
## diff lwr upr p adj
## Americas-Africa 1005.1037 466.8326 1543.3748 0.0000041
## Asia-Africa 1168.7636 628.2529 1709.2742 0.0000000
## Europe-Africa 2447.5453 1947.3838 2947.7067 0.0000000
## Oceania-Africa 3281.7976 2040.3410 4523.2543 0.0000000
## Asia-Americas 163.6599 -384.4160 711.7357 0.9256447
## Europe-Americas 1442.4416 934.1141 1950.7691 0.0000000
## Oceania-Americas 2276.6940 1031.9249 3521.4630 0.0000069
## Europe-Asia 1278.7817 768.0833 1789.4801 0.0000000
## Oceania-Asia 2113.0341 867.2950 3358.7732 0.0000402
## Oceania-Europe 834.2524 -394.5176 2063.0223 0.3421942
As seen above, there are two pairs with a p-value > 0.05: Asia-Americas and Europe-Oceania. This observation supports a three-way clustering of energy use by continent in which one cluster consists solely of Africa, the other consists of Asia and America, and the other consists of Europe and Oceania. Within-cluster similarity is illustrated by the below boxplot.
## Warning: Removed 436 rows containing non-finite values (`stat_boxplot()`).
Below we will determine whether there is a significant difference between Europe and Asia with respect to import goods and services as a % of GDP. Given that we are using a binary subset of the categorical continent variable for prediction and that import goods and services as a % of GDP is a quantitative variable, t-testing will be used.
## # A tibble: 1 × 1
## pval
## <dbl>
## 1 0.0412
Given that the p-value is below 0.05, there is a significant difference between Europe and Asia with respect to import goods and services as a % of GDP.
Below are the two countries tied for highest average population density per sq km of land area across all years within the dataset.
##
## Attaching package: 'dbplyr'
## The following objects are masked from 'package:dplyr':
##
## ident, sql
## # A tibble: 1 × 1
## `Country Name`
## <chr>
## 1 Macao SAR, China
Below is the country with the highest increase in life expectancy from 1962 to 2007.
## # A tibble: 1 × 1
## `Country Name`
## <chr>
## 1 Maldives